Convex Optimization 2015.09.23

Positive (semi)definite metrics

  • Notations: \(A \in S^n\) means \(A \in \mathbb{R}^{n \times n}\), \(A\) symmetric

    \(A \ge 0\) or \(A \in S_+^n\) means \(A\) positive semidefinite (\(x^TAx \ge 0, \forall x \in \mathbb{R}^n\))

    \(A \gt 0\) or \(A \in S_{++}^n\) means \(A\) positive definite (\(x^TAx \gt 0, \forall x \in \mathbb{R}^n \backslash \{0\}\))

  • Reminder: A matrix \(A\) is symmetric iff it is orthogonally diagonalizable: \(A = P^TDP, P^P = I\)

  • Facts: \(A \ge 0 \iff D \ge 0 \iff\) all eigenvalues of A are nonnegative

    \(A \gt 0 \iff D \gt 0 \iff\) all eigenvalues of A are positive

  • Proof: If \(D \ge 0\), then \(\exists D' \ge 0, s.t. D = D'^2\)

    Then \(x^TAx = x^TP^TD'D'Px = (D'Px)^TD'Px = \lVert D'Px \rVert _2^2 \ge 0\)

  • Notations: Outer product (of \(x, y\)): \(xy^T = \begin{bmatrix} \\ x\\ \\ \end{bmatrix} \begin{bmatrix} & y & \\ \end{bmatrix} = \begin{bmatrix} x_1y_1 & \cdots & x_1y_n \\ \vdots & & \vdots \\ x_ny_1 & \cdots & x_ny_n\\ \end{bmatrix}\)

    Let \(A = [a_1^T \cdots a_n^T]^T\), \(B = [b_1 ... b_n]\)

    Then \(BA = \begin{bmatrix} b_1 & \cdots & b_n\\ \end{bmatrix} \begin{bmatrix} a_1^T\\ \vdots \\ a_n^T\\ \end{bmatrix} = b_1a_1^T + ... + b_na_n^T\)

  • Lemma: If \(A \ge 0\) and \(A_{ii} = 0\), then the ith row and column of \(A\) are zero.

  • Picture:

    \(\begin{bmatrix} & & 0 & \\ & & 0 & \\ 0 & 0 & 0 & 0 \\ & & 0 & \\ \end{bmatrix}\)

  • Proof: WLOG \(i = 1\), If \(A_{1j} \neq 0\), then let \(x = \vec{e_1} + \lambda \vec{e_j}\).

    Then \(x^TAx = \lambda ^2 A_{jj} + 2 \lambda A_{1j}\)

    \(\frac{d}{d \lambda} x^TAx \underset{\lambda \neq 0}{=} (2 \lambda A_{jj} + 2A_{1j}) |_{\lambda = 0} = 2A_{1j} \neq 0, (x^TAx)|_{\lambda = 0} = 0\)

  • Lemma: Any matrix of the form \(B^TB\) is positive semidefinite, \(B \in \mathbb{R}^{m \times n}\)

  • Proof: Trivial (\(x^TB^TBx = \lVert Bx \rVert _2^2\))

  • Fact: For every \(A \in S_+^n\) there exists \(B \in \mathbb{R}^{n \times n}\), \(B\) upper triangle, such that \(A = B^TB\) (Cholesky decomposition)

    $B^TB =

    \[\begin{bmatrix} | & & & \\ | & | & & \\ | & | & | & \\ | & | & | & |\\ \end{bmatrix} \begin{bmatrix} - & - & - & -\\ & - & - & - \\ & & - & - \\ & & & - \\ \end{bmatrix}\]

    = A = $

    $

    \[\begin{bmatrix} | \\ | \\ | \\ | \\ \end{bmatrix} \begin{bmatrix} - & - & - & - \\ \end{bmatrix}\]
    • \[\begin{bmatrix} \\ | \\ | \\ | \\ \end{bmatrix} \begin{bmatrix} & - & - & - \\ \end{bmatrix}\]
      • = $
      \(\begin{bmatrix} \ast & \ast & \ast & \ast \\ \ast & \ast & \ast & \ast \\ \ast & \ast & \ast & \ast \\ \ast & \ast & \ast & \ast \\ \end{bmatrix} + \begin{bmatrix} & & & \\ & \ast & \ast & \ast \\ & \ast & \ast & \ast \\ & \ast & \ast & \ast \\ \end{bmatrix} + ... + \begin{bmatrix} & & & \\ & & & \\ & & & \\ & & & \ast \\ \end{bmatrix}\)
  • Proof: By lemma, can assume \(A_{11} \neq 0\), in fact, can assume that \(A_{11} = 1\), So

    \(A = \begin{bmatrix} 1 & a_{12}^T \\ a_{12} & A_{22} \\ \end{bmatrix}, B = \begin{bmatrix} 1 & & a_{12}^T & \\ & - & - & - \\ & & - & - \\ & & & - \\ \end{bmatrix}\)

    \(\begin{bmatrix} 1 \\ a_{12} \\ \end{bmatrix} \begin{bmatrix} 1 & a_{12}^T \\ \end{bmatrix} = \begin{bmatrix} 1 & a_{12}^T \\ a_{12} & a_{12}a_{12}^T \\ \end{bmatrix}\)

    Have \(A - \begin{bmatrix} 1 \\ a_{12} \\ \end{bmatrix} \begin{bmatrix} 1 & a_{12}^T \\ \end{bmatrix} = \begin{bmatrix} 0 & 0 \\ 0 & A_{22} - a_{12}a_{12}^T \\ \end{bmatrix}\)

    So need to show that:

    \(S = A_{22} - a_{12}a_{12}^T \in S_+^{n - 1}\) is positive semidefinite

    Let \(x \in \mathbb{R}^{n - 1}\),

    Then \(x^TSx = x^TAx - x^Ta_{12}a_{12}^Tx = x^TA - (a_{12}^Tx)^2\)

    Let \(y = \begin{bmatrix} -a_{12}^Tx \\ x \\ \\ \end{bmatrix},\)

    \(y^TAy = (a_{12}^Tx)^2 + x^TA_{22}x - (a_{12}^Tx)^2 - (a_{12}^Tx)^2 = x^TSx \ge 0\)

  • Basic Fact: \(x \rightarrow \sqrt{x^TAx}\) is a norm if \(A \gt 0\)

  • Proof: \(\lVert x + y \rVert \le \lVert x \rVert + \lVert y \rVert\)

    \(\sqrt{(x + y)^TA(x + y)} \le \sqrt{x^TAx} + \sqrt{y^TAy}\)

    \(x^TAx + y^TAy + 2x^TAy \le x^TAx + y^TAy + 2 \sqrt{x^TAx} \sqrt{y^TAy}\)

    \((yA^Tx)(yA^Tx) \le (x^TAx)(y^TAy)\)

    Let \(z = Bx, w = By\),

    Then \((w^Tz)(w^Tz) \le (z^Tz)(w^Tw) = \lVert z \rVert _2^2 \lVert w \rVert _2^2\)

  • Basic Fact: The function \(x \rightarrow x^TAx\) is convex if \(A \ge 0\)

  • Proof: Same as for the function \(x^2\) (At some point use that \((x + y)^TA(x + y) \ge 0\))

  • Definition: An ellipsod in \(\mathbb{R}^n\) is a set of the form \(\mathcal{E} = \{x : (x - x_0)^TA^{-1}(x - x_0) \le 1\}\)

    for some \(A \gt 0\)

    Have \(A = P^TDP\), put \(x - x_0 = P^Tu\)

    \(A^{-1} = P^TD^{-1}P, (u^TP)A^{-1}(P^Tu) = u^TD^{-1}u = \sum_{i = 1}^n \frac{1}{\lambda_i} u_i^2\)

  • Lemma: A set \(\mathcal{E}\) is an ellipsoid iff and only if there exists an invertible matrix \(Z \in \mathbb{R}^{n \times n}, s.t. \mathcal{E} = \{x_0 + Zu : \lVert u \rVert _2 \le 1\}\)

  • Proof: \(\{x_0 + Zu : \lVert u \rVert _2 \le 1\} = \{ y : \lVert Z^{-1}(y - x_0) \rVert _2 \le 1\} = \{y : (y - x_0)^TZ^{-1T}Z^{-1}(y - x_0) \le 1\}\)

Hyperplane Seperation Theorem:

  • Theorem: If \(C, D \in \mathbb{R}^n\) are convex sets, disjoint, then there exists \(a \in \mathbb{R}^n \backslash {0}, b \in \mathbb{R}, s.t. a^Tx \le b, \forall x \in C; a^Tx \ge b, \forall x \in D\)

  • Proof: when \(C, D\) are closed and \(D\) is bonded,

    Let \(d(C, D) = \underset{x \in C, y \in D} {inf} \lVert x - y \rVert _2\),

    Then \(\exists (x_n)_{n = 1}^{\infty} \subseteq C, (y_n)_{n = 1}^{\infty} \subseteq D, s.t. \lim_{n \to \infty} \lVert x_n - y_n \rVert _2 = d(C, D)\)

    Even, can assume \((y_n)_{n = 1}^{\infty}\) is convergent.

    Then \(\lim_{n \to \infty} y_n \in D\) because \(D\) is closed

    can also assume that \(x_n\) convergent

    Now, let \(c = \lim_{n \to \infty} x_n \in C, d = \lim_{n \to \infty} y_n \in D\)

    Then \(\lVert c - d \rVert _2 = \lim_{n \to \infty} \lVert x_n - y_n \rVert _2 = d(C, D) \gt 0\)

    Choose \(a = d - c, b = (d - c)^T(\frac{d + c}{2})\)

    Assume by contradiction that \(\exists x_0 \in D, s.t. a^Tx_0 \lt b\)

    Hope is that some point of form \(d + \lambda(x_0 - d)\) is even closer to \(c\) than \(d\) is,

    for \(\lambda \in [0, 1], (d + \lambda (x_0 - d) - c)^T (d + \lambda (x_0 - d) - c)\)

    \(= d^Td - 2d^Tc + c^Tc + 2 \lambda (x_0 - d)^T (d - c) + {\lambda}^2(x_0 - d)^T(x_0 - d)\)

    \(= \lVert d - c \rVert _2^2 + 2 \lambda (x_0 - d)^T (d - c) + {\lambda}^2(x_0 - d)^T(x_0 - d)\)

    And \((d - c)^T(\frac{d - c}{2}) = \frac{\lVert d - c \rVert _2^2}{2} \ge 0, a^Tx_0 \lt b\)

    \(\implies (x_0 - d)^T - (d - c)^T(\frac{d + c}{2}) \lt 0\)

    So \((x_0 - d)^T - (d - c)^T(\frac{d + c}{2}) - (d - c)^T(\frac{d - c}{2}) \lt 0\)

    \(\implies \lambda = 0, \frac{d}{d \lambda} \cdots = (x_0 - d)^T(d - c) + 2 \lambda (x_0 - d)^T(x_0 - d) = (x_0 - d)^T(d - c) \lt 0\)

    \(\min \lVert Ax - b \rVert _2^2\) where \(A \in \mathbb{R}^{m \times n}, b \in \mathbb{R}^m\)

    Have \(\lVert Ax - b \rVert _2^2 = (Ax - b)^T(Ax - b) = x^TA^TAx - 2b^TAx + b^Tb\)

  • Definition: \(\nabla f(x_1, \cdots, x_n) = [\frac{\partial}{\partial x_1} f, ..., \frac{\partial}{\partial x_n} f]^T\)

    \(\nabla b^Tb = 0, \nabla - 2b^TAx = -2A^Tb, \nabla x^TA^TAx = 2A^TAx (A^TAx - A^Tb = 0)\)